Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.
translated by 谷歌翻译
We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The Metaverse is deemed the next evolution of the Internet and has received much attention recently. Metaverse applications via mobile augmented reality (MAR) require rapid and accurate object detection to mix digital data with the real world. As mobile devices evolve, they become more potent in computing. Hence, their computational resources can be leveraged to train machine learning models. In light of the increasing concerns of user privacy and data security, federated learning (FL) has become a promising distributed learning framework for privacy-preserving analytics. In this article, FL and MAR are brought together in the Metaverse. We discuss the necessity and rationality of the combination of FL and MAR. The prospective technologies that power FL and MAR in the Metaverse are also identified. In addition, existing challenges that prevent the fulfilment of FL and MAR in the Metaverse and several application scenarios are presented. Finally, two case studies of Metaverse FL-MAR systems are demonstrated.
translated by 谷歌翻译
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student frameworks to gradually refine the training labels and improve the generalization ability of NER models. However, we argue that the performance of the current self-training frameworks for DS-NER is severely underestimated by their plain designs, including both inadequate student learning and coarse-grained teacher updating. Therefore, in this paper, we make the first attempt to alleviate these issues by proposing: (1) adaptive teacher learning comprised of joint training of two teacher-student networks and considering both consistent and inconsistent predictions between two teachers, thus promoting comprehensive student learning. (2) fine-grained student ensemble that updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise. To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.
translated by 谷歌翻译
Accurate polyp segmentation is of great importance for colorectal cancer diagnosis and treatment. However, due to the high cost of producing accurate mask annotations, existing polyp segmentation methods suffer from severe data shortage and impaired model generalization. Reversely, coarse polyp bounding box annotations are more accessible. Thus, in this paper, we propose a boosted BoxPolyp model to make full use of both accurate mask and extra coarse box annotations. In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model. To achieve this goal, a fusion filter sampling (FFS) module is firstly proposed to generate pixel-wise pseudo labels from box annotations with less noise, leading to significant performance improvements. Besides, considering the appearance consistency of the same polyp, an image consistency (IC) loss is designed. Such IC loss explicitly narrows the distance between features extracted by two different networks, which improves the robustness of the model. Note that our BoxPolyp is a plug-and-play model, which can be merged into any appealing backbone. Quantitative and qualitative experimental results on five challenging benchmarks confirm that our proposed model outperforms previous state-of-the-art methods by a large margin.
translated by 谷歌翻译
Federated Learning (FL) is pervasive in privacy-focused IoT environments since it enables avoiding privacy leakage by training models with gradients instead of data. Recent works show the uploaded gradients can be employed to reconstruct data, i.e., gradient leakage attacks, and several defenses are designed to alleviate the risk by tweaking the gradients. However, these defenses exhibit weak resilience against threatening attacks, as the effectiveness builds upon the unrealistic assumptions that deep neural networks are simplified as linear models. In this paper, without such unrealistic assumptions, we present a novel defense, called Refiner, instead of perturbing gradients, which refines ground-truth data to craft robust data that yields sufficient utility but with the least amount of privacy information, and then the gradients of robust data are uploaded. To craft robust data, Refiner promotes the gradients of critical parameters associated with robust data to close ground-truth ones while leaving the gradients of trivial parameters to safeguard privacy. Moreover, to exploit the gradients of trivial parameters, Refiner utilizes a well-designed evaluation network to steer robust data far away from ground-truth data, thereby alleviating privacy leakage risk. Extensive experiments across multiple benchmark datasets demonstrate the superior defense effectiveness of Refiner at defending against state-of-the-art threats.
translated by 谷歌翻译
Federated learning allows multiple clients to collaboratively train a model without exchanging their data, thus preserving data privacy. Unfortunately, it suffers significant performance degradation under heterogeneous data at clients. Common solutions in local training involve designing a specific auxiliary loss to regularize weight divergence or feature inconsistency. However, we discover that these approaches fall short of the expected performance because they ignore the existence of a vicious cycle between classifier divergence and feature mapping inconsistency across clients, such that client models are updated in inconsistent feature space with diverged classifiers. We then propose a simple yet effective framework named Federated learning with Feature Anchors (FedFA) to align the feature mappings and calibrate classifier across clients during local training, which allows client models updating in a shared feature space with consistent classifiers. We demonstrate that this modification brings similar classifiers and a virtuous cycle between feature consistency and classifier similarity across clients. Extensive experiments show that FedFA significantly outperforms the state-of-the-art federated learning algorithms on various image classification datasets under label and feature distribution skews.
translated by 谷歌翻译
With the development of deep learning and Transformer-based pre-trained models like BERT, the accuracy of many NLP tasks has been dramatically improved. However, the large number of parameters and computations also pose challenges for their deployment. For instance, using BERT can improve the predictions in the financial sentiment analysis (FSA) task but slow it down, where speed and accuracy are equally important in terms of profits. To address these issues, we first propose an efficient and lightweight BERT (ELBERT) along with a novel confidence-window-based (CWB) early exit mechanism. Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size. Afterward, a fast and high-accuracy FSA system is built. Experimental results show that the proposed CWB early exit mechanism achieves significantly higher accuracy than existing early exit methods on BERT under the same computation cost. By using this acceleration method, our FSA system can boost the processing speed by nearly 40 times to over 1000 texts per second with sufficient accuracy, which is nearly twice as fast as FastBERT, thus providing a more powerful text processing capability for modern trading systems.
translated by 谷歌翻译
与卷积神经网络(CNN)相比,视觉变压器(VIT)表现出了有希望的性能,但是VIT的训练比CNN难得多。在本文中,我们定义了几个指标,包括动态数据比例(DDP)和知识同化率(KAR),以研究训练过程,并将其分为三个时期:形成,增长和探索。特别是,在训练的最后阶段,我们观察到只有很小的训练示例用于优化模型。鉴于VIT的数据渴望的性质,我们提出了一个简单但重要的问题:在培训的每个阶段,是否有可能提供丰富的``有效''培训示例吗?为了解决这个问题,我们需要解决两个关键问题,即\ ie,如何衡量单个培训示例的``有效性'',以及如何系统地生成足够数量的``有效''示例。为了回答第一个问题,我们发现训练样本的``困难''可以作为衡量培训样本的``有效性''的指标。为了解决第二个问题,我们建议在这些演化阶段动态调整训练数据的``难度''分布。为了实现这两个目的,我们提出了一个新颖的以数据为中心的VIT培训框架,以动态测量训练样本的``难度'',并为不同培训阶段的模型生成``有效的''样品。此外,为了进一步扩大``有效''样品的数量,并减轻了VIT的后期训练阶段的过度拟合问题,我们提出了一种称为Patcherasing的补丁级擦除策略。广泛的实验证明了提出的以数据为中心的VIT培训框架和技术的有效性。
translated by 谷歌翻译